{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Inferential statistics\n", "Often, we are not only interested in describing our data with descriptive statistics like the mean and standard deviation, but want to know whether two or more sets of measurements are likely to come from the same underlying distribution. We want to draw inferences from the data. This is what inferential statistics is about.\n", "\n", "To learn how to do this in python, let's use some example data:\n", "\n", "To test whether a new wonder drug increases the eye sight, Linda and Anabel ran the following experiment with student subjects:\n", "\n", "Experimental subjects were injected a saline solution containing 1nM of the wonder drug. Control subjects were injected saline without the drug. \n", "The drug is only effective for an hour or so. To assess the effect of the drug, eye sight was scored by testing the subjects' ability to read small text within one hour of drug injection.\n", "\n", "However, Linda and Anabel used two different experimental designs:\n", "1. Linda tested each student on ten consecutive days and measured the performance after each experiment. She used 50 control (saline only) and 50 experimental subjects (saline+drug) - so 100 subjects in total, 10 measurements (score after injection) per subject.\n", "2. Anabel only performed a single test per subject, but she measured the eye sight 30 minutes before and 30 minutes after the treatment. She tested 60 different subjects with two measurements each (score before and score after injection).\n", "\n", "Our task is now to determine whether the wonder drug really improves eye sight as tested in these two sets of experiments.\n", "\n", "Let's start with the first dataset:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import scipy\n", "\n", "plt.style.use('ncb.mplstyle')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | animal | \n", "treatment | \n", "score_after | \n", "
---|---|---|---|
0 | \n", "0 | \n", "0 | \n", "10.053951 | \n", "
1 | \n", "0 | \n", "0 | \n", "5.894092 | \n", "
2 | \n", "0 | \n", "0 | \n", "13.447026 | \n", "
3 | \n", "0 | \n", "0 | \n", "6.579613 | \n", "
4 | \n", "0 | \n", "0 | \n", "8.482990 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
995 | \n", "99 | \n", "1 | \n", "12.000260 | \n", "
996 | \n", "99 | \n", "1 | \n", "12.277938 | \n", "
997 | \n", "99 | \n", "1 | \n", "13.718489 | \n", "
998 | \n", "99 | \n", "1 | \n", "14.272301 | \n", "
999 | \n", "99 | \n", "1 | \n", "13.211777 | \n", "
1000 rows × 3 columns
\n", "\n", " | animal | \n", "treatment | \n", "score_after | \n", "
---|---|---|---|
0 | \n", "0 | \n", "0 | \n", "10.053951 | \n", "
1 | \n", "0 | \n", "0 | \n", "5.894092 | \n", "
2 | \n", "0 | \n", "0 | \n", "13.447026 | \n", "
3 | \n", "0 | \n", "0 | \n", "6.579613 | \n", "
4 | \n", "0 | \n", "0 | \n", "8.482990 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
995 | \n", "99 | \n", "1 | \n", "12.000260 | \n", "
996 | \n", "99 | \n", "1 | \n", "12.277938 | \n", "
997 | \n", "99 | \n", "1 | \n", "13.718489 | \n", "
998 | \n", "99 | \n", "1 | \n", "14.272301 | \n", "
999 | \n", "99 | \n", "1 | \n", "13.211777 | \n", "
1000 rows × 3 columns
\n", "\n", " | animal | \n", "treatment | \n", "score_after | \n", "
---|---|---|---|
0 | \n", "0 | \n", "0 | \n", "9.097427 | \n", "
1 | \n", "1 | \n", "0 | \n", "15.234215 | \n", "
2 | \n", "2 | \n", "0 | \n", "10.425896 | \n", "
3 | \n", "3 | \n", "0 | \n", "11.675589 | \n", "
4 | \n", "4 | \n", "0 | \n", "13.972324 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
95 | \n", "95 | \n", "1 | \n", "13.760092 | \n", "
96 | \n", "96 | \n", "1 | \n", "13.203628 | \n", "
97 | \n", "97 | \n", "1 | \n", "16.347525 | \n", "
98 | \n", "98 | \n", "1 | \n", "16.062462 | \n", "
99 | \n", "99 | \n", "1 | \n", "13.570623 | \n", "
100 rows × 3 columns
\n", "